SPARK-1311 Map side distinct in collect vertex ids from edges graphx #21

holdenk · 2014-02-27T04:38:21Z

No description provided.

AmplabJenkins · 2014-02-27T04:38:50Z

Merged build triggered.

AmplabJenkins · 2014-02-27T04:38:50Z

Merged build started.

AmplabJenkins · 2014-02-27T04:38:58Z

Merged build triggered.

AmplabJenkins · 2014-02-27T04:43:49Z

Merged build triggered.

AmplabJenkins · 2014-02-27T04:43:50Z

Merged build started.

AmplabJenkins · 2014-02-27T04:43:57Z

Merged build triggered.

AmplabJenkins · 2014-02-27T04:48:54Z

Merged build finished.

AmplabJenkins · 2014-02-27T04:48:54Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12900/

AmplabJenkins · 2014-02-27T05:13:02Z

Merged build finished.

AmplabJenkins · 2014-02-27T05:13:02Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12901/

pwendell · 2014-03-06T22:58:52Z

@ankurdave does this look okay to you?

ankurdave · 2014-03-06T23:00:07Z

Looks good. Thanks!

rxin · 2014-03-06T23:09:36Z

Actually - no ...

rxin · 2014-03-06T23:09:45Z

We should use the primitive hashmap - otherwise it is pretty slow

holdenk · 2014-03-06T23:31:00Z

Ok I'll switch it tonight.

On Thu, Mar 6, 2014 at 3:09 PM, Reynold Xin [email protected]:

We should use the primitive hashmap - otherwise it is pretty slow

Reply to this email directly or view it on GitHubhttps://github.com//pull/21#issuecomment-36949585
.

Cell : 425-233-8271

…ISTANCE_MASK), provide default params for rehashIfNeeded, and switch the graphimpl to use it instead of the java hash map which is probably what Reynold ment in the CR feedback

AmplabJenkins · 2014-03-08T08:30:01Z

Merged build triggered.

AmplabJenkins · 2014-03-08T08:30:01Z

Merged build started.

AmplabJenkins · 2014-03-08T08:40:07Z

Merged build finished.

AmplabJenkins · 2014-03-08T08:40:07Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13068/

holdenk · 2014-03-08T09:22:59Z

retest this please

AmplabJenkins · 2014-03-08T09:30:08Z

Merged build triggered.

AmplabJenkins · 2014-03-08T09:30:08Z

Merged build started.

AmplabJenkins · 2014-03-08T10:29:10Z

Merged build finished.

AmplabJenkins · 2014-03-08T10:29:10Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13069/

…ommands. Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: ``` CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` With this PR user can define schema instead of infer from file, so support ddl command as follows: ``` CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` Author: scwf <[email protected]> Author: Yin Huai <[email protected]> Author: Fei Wang <[email protected]> Author: wangfei <[email protected]> Closes #3431 from scwf/ddl and squashes the following commits: 7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin 38f634e [Yin Huai] Remove Option from createRelation. 65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd. a852b10 [scwf] remove cleanIdentifier f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin baf79b5 [Yin Huai] Test special characters quoted by backticks. 50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType. 1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin f5c22b0 [Yin Huai] Refactor code and update test cases. f1cffe4 [Yin Huai] Revert "minor refactory" b621c8f [scwf] minor refactory d02547f [scwf] fix HiveCompatibilitySuite test failure 8dfbf7a [scwf] more tests for complex data type ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin 91ad91b [Yin Huai] Parse data types in DDLParser. cf982d2 [scwf] fixed test failure 445b57b [scwf] address comments 02a662c [scwf] style issue 44eb70c [scwf] fix decimal parser issue 83b6fc3 [scwf] minor fix 9bf12f8 [wangfei] adding test case 7787ec7 [wangfei] added SchemaRelationProvider 0ba70df [wangfei] draft version

…ommands. Adding support for defining schema in foreign DDL commands. Now foreign DDL support commands like: ``` CREATE TEMPORARY TABLE avroTable USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` With this PR user can define schema instead of infer from file, so support ddl command as follows: ``` CREATE TEMPORARY TABLE avroTable(a int, b string) USING org.apache.spark.sql.avro OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro") ``` Author: scwf <[email protected]> Author: Yin Huai <[email protected]> Author: Fei Wang <[email protected]> Author: wangfei <[email protected]> Closes apache#3431 from scwf/ddl and squashes the following commits: 7e79ce5 [Fei Wang] Merge pull request apache#22 from yhuai/pr3431yin 38f634e [Yin Huai] Remove Option from createRelation. 65e9c73 [Yin Huai] Revert all changes since applying a given schema has not been testd. a852b10 [scwf] remove cleanIdentifier f336a16 [Fei Wang] Merge pull request apache#21 from yhuai/pr3431yin baf79b5 [Yin Huai] Test special characters quoted by backticks. 50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to StringType. 1eeb769 [Fei Wang] Merge pull request apache#20 from yhuai/pr3431yin f5c22b0 [Yin Huai] Refactor code and update test cases. f1cffe4 [Yin Huai] Revert "minor refactory" b621c8f [scwf] minor refactory d02547f [scwf] fix HiveCompatibilitySuite test failure 8dfbf7a [scwf] more tests for complex data type ddab984 [Fei Wang] Merge pull request apache#19 from yhuai/pr3431yin 91ad91b [Yin Huai] Parse data types in DDLParser. cf982d2 [scwf] fixed test failure 445b57b [scwf] address comments 02a662c [scwf] style issue 44eb70c [scwf] fix decimal parser issue 83b6fc3 [scwf] minor fix 9bf12f8 [wangfei] adding test case 7787ec7 [wangfei] added SchemaRelationProvider 0ba70df [wangfei] draft version

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

defined ArrowPayload and encapsulated Arrow classes in ArrowConverters addressed some minor comments in code review closes apache#21

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

defined ArrowPayload and encapsulated Arrow classes in ArrowConverters addressed some minor comments in code review closes apache#21

Feature/spark flow bumped

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

[SPARK-601] Use FBS and env for network auth

quick fix

) * Add spark distribued cache oozie example * Delete script and update workflow * Update distributed cache to access file from hdfs * Add README and update sparkTag

…elper#replaceNotIncludedMsg` to remove `@hashCode` ### What changes were proposed in this pull request? The daily build GA `Master Scala 2.13 + Hadoop 3 + JDK 8` failed after #39468 merged, the failed tests includes: - org.apache.spark.sql.SQLQueryTestSuite - org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite and the failure error message is similar to the following: ``` 2023-01-11T01:03:46.3478323Z 01:03:46.347 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: 2023-01-11T01:03:46.3677880Z [info]- explain.sql *** FAILED *** (348 milliseconds) 2023-01-11T01:03:46.3678710Z [info] explain.sql 2023-01-11T01:03:46.3679479Z [info] Expected "...es.CatalogFileIndex[7d811218, [key, val] 2023-01-11T01:03:46.3680509Z [info] +- Project [key#x, val#x] 2023-01-11T01:03:46.3681033Z [info] +- SubqueryAlias spark_catalog.default.explain_temp4 2023-01-11T01:03:46.3684259Z [info] +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet 2023-01-11T01:03:46.3684922Z [info] 2023-01-11T01:03:46.3685766Z [info] == Optimized Logical Plan == 2023-01-11T01:03:46.3687590Z [info] InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218, [key, val] 2023-01-11T01:03:46.3688465Z [info] +- WriteFiles 2023-01-11T01:03:46.3690929Z [info] +- Sort [val#x ASC NULLS FIRST], false 2023-01-11T01:03:46.3691387Z [info] +- Project [key#x, empty2null(val#x) AS val#x] 2023-01-11T01:03:46.3692078Z [info] +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet 2023-01-11T01:03:46.3692549Z [info] 2023-01-11T01:03:46.3693443Z [info] == Physical Plan == 2023-01-11T01:03:46.3695233Z [info] Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218], [key, val] 2023-01-11T01:03:46.3696100Z [info] +- Writ...", but got "...es.CatalogFileIndex[cdfa8472, [key, val] 2023-01-11T01:03:46.3698327Z [info] +- Project [key#x, val#x] 2023-01-11T01:03:46.3698881Z [info] +- SubqueryAlias spark_catalog.default.explain_temp4 2023-01-11T01:03:46.3699680Z [info] +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet 2023-01-11T01:03:46.3704986Z [info] 2023-01-11T01:03:46.3705457Z [info] == Optimized Logical Plan == 2023-01-11T01:03:46.3717140Z [info] InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472, [key, val] 2023-01-11T01:03:46.3718309Z [info] +- WriteFiles 2023-01-11T01:03:46.3718964Z [info] +- Sort [val#x ASC NULLS FIRST], false 2023-01-11T01:03:46.3719752Z [info] +- Project [key#x, empty2null(val#x) AS val#x] 2023-01-11T01:03:46.3723046Z [info] +- Relation spark_catalog.default.explain_temp4[key#x,val#x] parquet 2023-01-11T01:03:46.3723598Z [info] 2023-01-11T01:03:46.3726955Z [info] == Physical Plan == 2023-01-11T01:03:46.3728111Z [info] Execute InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472], [key, val] 2023-01-11T01:03:46.3898445Z [info] +- Writ..." Result did not match for query #21 2023-01-11T01:03:46.3902948Z [info] EXPLAIN EXTENDED INSERT INTO TABLE explain_temp5 SELECT * FROM explain_temp4 (SQLQueryTestSuite.scala:495) 2023-01-11T01:03:46.3903881Z [info] org.scalatest.exceptions.TestFailedException: 2023-01-11T01:03:46.3904492Z [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) 2023-01-11T01:03:46.3905449Z [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) 2023-01-11T01:03:46.3906493Z [info] at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) 2023-01-11T01:03:46.3907683Z [info] at org.scalatest.Assertions.assertResult(Assertions.scala:847) 2023-01-11T01:03:46.3908243Z [info] at org.scalatest.Assertions.assertResult$(Assertions.scala:842) 2023-01-11T01:03:46.3908812Z [info] at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564) 2023-01-11T01:03:46.3910011Z [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$11(SQLQueryTestSuite.scala:495) 2023-01-11T01:03:46.3910611Z [info] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563) 2023-01-11T01:03:46.3911163Z [info] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561) 2023-01-11T01:03:46.3912094Z [info] at scala.collection.AbstractIterable.foreach(Iterable.scala:926) 2023-01-11T01:03:46.3912781Z [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runQueries$9(SQLQueryTestSuite.scala:486) 2023-01-11T01:03:46.3913371Z [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) 2023-01-11T01:03:46.3914237Z [info] at org.scalatest.Assertions.withClue(Assertions.scala:1065) 2023-01-11T01:03:46.3915165Z [info] at org.scalatest.Assertions.withClue$(Assertions.scala:1052) 2023-01-11T01:03:46.3915725Z [info] at org.scalatest.funsuite.AnyFunSuite.withClue(AnyFunSuite.scala:1564) 2023-01-11T01:03:46.3916341Z [info] at org.apache.spark.sql.SQLQueryTestSuite.runQueries(SQLQueryTestSuite.scala:462) 2023-01-11T01:03:46.3917485Z [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35(SQLQueryTestSuite.scala:364) 2023-01-11T01:03:46.3918517Z [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$runTest$35$adapted(SQLQueryTestSuite.scala:362) 2023-01-11T01:03:46.3919102Z [info] at scala.collection.immutable.List.foreach(List.scala:333) 2023-01-11T01:03:46.3919675Z [info] at org.apache.spark.sql.SQLQueryTestSuite.runTest(SQLQueryTestSuite.scala:362) 2023-01-11T01:03:46.3921754Z [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$6(SQLQueryTestSuite.scala:269) 2023-01-11T01:03:46.3922358Z [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) 2023-01-11T01:03:46.3923784Z [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) 2023-01-11T01:03:46.3924473Z [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) 2023-01-11T01:03:46.3925286Z [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) 2023-01-11T01:03:46.3926199Z [info] at org.scalatest.Transformer.apply(Transformer.scala:22) 2023-01-11T01:03:46.3927071Z [info] at org.scalatest.Transformer.apply(Transformer.scala:20) 2023-01-11T01:03:46.3928583Z [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) 2023-01-11T01:03:46.3929225Z [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207) 2023-01-11T01:03:46.3930091Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) 2023-01-11T01:03:46.3933329Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) 2023-01-11T01:03:46.3933893Z [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) 2023-01-11T01:03:46.3934875Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) 2023-01-11T01:03:46.3935479Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) 2023-01-11T01:03:46.3936453Z [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66) 2023-01-11T01:03:46.3937318Z [info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) 2023-01-11T01:03:46.3940707Z [info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) 2023-01-11T01:03:46.3941350Z [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66) 2023-01-11T01:03:46.3941962Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) 2023-01-11T01:03:46.3943332Z [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) 2023-01-11T01:03:46.3944504Z [info] at scala.collection.immutable.List.foreach(List.scala:333) 2023-01-11T01:03:46.3950194Z [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 2023-01-11T01:03:46.3950748Z [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) 2023-01-11T01:03:46.3951912Z [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) 2023-01-11T01:03:46.3952515Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) 2023-01-11T01:03:46.3953476Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) 2023-01-11T01:03:46.3954069Z [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) 2023-01-11T01:03:46.3966445Z [info] at org.scalatest.Suite.run(Suite.scala:1114) 2023-01-11T01:03:46.3967583Z [info] at org.scalatest.Suite.run$(Suite.scala:1096) 2023-01-11T01:03:46.3968377Z [info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) 2023-01-11T01:03:46.3969537Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) 2023-01-11T01:03:46.3970510Z [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) 2023-01-11T01:03:46.3971298Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) 2023-01-11T01:03:46.3972182Z [info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) 2023-01-11T01:03:46.3973529Z [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:66) 2023-01-11T01:03:46.3974433Z [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) 2023-01-11T01:03:46.3977778Z [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) 2023-01-11T01:03:46.3984781Z [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) 2023-01-11T01:03:46.3985521Z [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:66) 2023-01-11T01:03:46.3986684Z [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) 2023-01-11T01:03:46.3987264Z [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) 2023-01-11T01:03:46.3987774Z [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413) 2023-01-11T01:03:46.3988269Z [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2023-01-11T01:03:46.3989260Z [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2023-01-11T01:03:46.3996895Z [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2023-01-11T01:03:46.3997550Z [info] at java.lang.Thread.run(Thread.java:750) ``` The reason for the failure is that the result of `CatalogFileIndex` printed when using Scala 2.12 is different from Scala 2.13: When using Scala 2.12, the `hashCode` of `CatalogFileIndex` is `7d811218` ``` InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndex7d811218, [key, val] ``` When using Scala 2.13, the `hashCode` of `CatalogFileIndex` is `cdfa8472` ``` InsertIntoHadoopFsRelationCommand Location [not included in comparison]/{warehouse_dir}/explain_temp5], Append, `spark_catalog`.`default`.`explain_temp5`, org.apache.spark.sql.execution.datasources.CatalogFileIndexcdfa8472, [key, val] ``` So this add a new `replaceAll` action to `SQLQueryTestHelper#replaceNotIncludedMsg` to remove `hashCode` to make `CatalogFileIndex` print the same results when using Scala 2.12 and 2.13 ### Why are the changes needed? Make daily build GA `Master Scala 2.13 + Hadoop 3 + JDK 8` can run successfully ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual test with this pr: ``` gh pr checkout 39598 dev/change-scala-version.sh 2.13 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain.sql" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13 build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain.sql" -Pscala-2.13 build/sbt -Phive-thriftserver "hive-thriftserver/testOnly org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -- -z explain-aqe.sql" -Pscala-2.13 ``` Closes #39598 from LuciferYang/SPARK-41708-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…21) * [CARMEL-3150] Integration with admin and metadata cache notification * fix compile error * only send metasote event to one queue

Why the plans of TPCDSV1_4_PlanStabilitySuite changed? For q5, the condition of EliminateOuterJoin only fulfilled after InferFiltersFromConstraints, so the added InferFiltersFromConstraints pushes more predicates down For q8, the added predicates of filter apache#19 is inferred by InferFiltersFromConstraints according to EqualNullSafe; about the added predicates of scan apache#13 and filter apache#15, after the predicate isnotnull(substr(ca_zip#9, 1, 5)) is pushed to filter apache#15 by current rules, then the laster InferFiltersFromConstraints pushes isnotnull(ca_zip#9) to filter apache#15 and scan apache#13. For q14a, the added predicates of filter apache#18 is inferred by InferFiltersFromConstraints according to EqualNullSafe. For q14b, it's same with q14a. For q85, PushExtraPredicateThroughJoin pushes predicates from join apache#21, then the added InferFiltersFromConstraints inferred the predicates from join apache#27 For q93, it's same as q5 Why the plans of TPCDSV2_7_PlanStabilityWithStatsSuite, TPCDSV1_4_PlanStabilityWithStatsSuite and TPCDSV2_7_PlanStabilitySuite changed? For TPCDSV2_7_PlanStabilityWithStatsSuite and TPCDSV2_7_PlanStabilitySuite, it's same as TPCDSV1_4_PlanStabilitySuite. For TPCDSV1_4_PlanStabilityWithStatsSuite, most plans are changed with same reason as TPCDSV1_4_PlanStabilitySuite, except q85, whose plan is changed further by rule CostBasedJoinReorder is triggered.

### What changes were proposed in this pull request? This PR aims to add `Python 3.10` to Infra docker images. ### Why are the changes needed? This is a preparation to add a daily `Python 3.10` GitHub Action job later for Apache Spark 4.0.0. Note that Python 3.10 is installed at the last step to avoid the following issues which happens when we install Python 3.9 and 3.10 at the same stage by package manager. ``` #21 13.03 ERROR: Cannot uninstall 'blinker'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall. #21 ERROR: process "/bin/sh -c python3.9 -m pip install numpy 'pyarrow>=14.0.0' 'pandas<=2.1.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'" did not complete successfully: exit code: 1 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1. I verified that the Python CI is not affected and still use Python 3.9.5 only. ``` ======================================================================== Running PySpark tests ======================================================================== Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log Will test against the following Python executables: ['python3.9'] Will test the following Python modules: ['pyspark-errors'] python3.9 python_implementation is CPython python3.9 version is: Python 3.9.5 Starting test(python3.9): pyspark.errors.tests.test_errors (temp output: /__w/spark/spark/python/target/fd967f24-3607-4aa6-8190-3f8d7de522e1/python3.9__pyspark.errors.tests.test_errors___zauwgy1.log) Finished test(python3.9): pyspark.errors.tests.test_errors (0s) Tests passed in 0 seconds ``` 2. Pass `Base Image Build` step for new Python 3.10. ![Screenshot 2023-11-16 at 10 53 37 AM](https://github.com/apache/spark/assets/9700541/6bbb3461-c5f0-4d60-94f6-7cd8df0594ed) 3. Since new Python 3.10 is not used in CI, we need to validate like the following. ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 --version Python 3.10.13 ``` ``` $ docker run -it --rm ghcr.io/dongjoon-hyun/apache-spark-ci-image:master-6895105871 python3.10 -m pip freeze alembic==1.12.1 annotated-types==0.6.0 blinker==1.7.0 certifi==2019.11.28 chardet==3.0.4 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.2.1 contourpy==1.2.0 coverage==7.3.2 cycler==0.12.1 databricks-cli==0.18.0 dbus-python==1.2.16 deepspeed==0.12.3 distro-info==0.23+ubuntu1.1 docker==6.1.3 entrypoints==0.4 et-xmlfile==1.1.0 filelock==3.9.0 Flask==3.0.0 fonttools==4.44.3 gitdb==4.0.11 GitPython==3.1.40 googleapis-common-protos==1.56.4 greenlet==3.0.1 grpcio==1.56.2 grpcio-status==1.48.2 gunicorn==21.2.0 hjson==3.1.0 idna==2.8 importlib-metadata==6.8.0 itsdangerous==2.1.2 Jinja2==3.1.2 joblib==1.3.2 kiwisolver==1.4.5 lxml==4.9.3 Mako==1.3.0 Markdown==3.5.1 MarkupSafe==2.1.3 matplotlib==3.8.1 memory-profiler==0.60.0 mlflow==2.8.1 mpmath==1.3.0 networkx==3.0 ninja==1.11.1.1 numpy==1.26.2 oauthlib==3.2.2 openpyxl==3.1.2 packaging==23.2 pandas==2.1.3 Pillow==10.1.0 plotly==5.18.0 protobuf==3.20.3 psutil==5.9.6 py-cpuinfo==9.0.0 pyarrow==14.0.1 pydantic==2.5.1 pydantic_core==2.14.3 PyGObject==3.36.0 PyJWT==2.8.0 pynvml==11.5.0 pyparsing==3.1.1 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 querystring-parser==1.2.4 requests==2.31.0 requests-unixsocket==0.2.0 scikit-learn==1.1.3 scipy==1.11.3 six==1.14.0 smmap==5.0.1 SQLAlchemy==2.0.23 sqlparse==0.4.4 sympy==1.12 tabulate==0.9.0 tenacity==8.2.3 threadpoolctl==3.2.0 torch==2.0.1+cpu torcheval==0.0.7 torchvision==0.15.2+cpu tqdm==4.66.1 typing_extensions==4.8.0 tzdata==2023.3 unattended-upgrades==0.1 unittest-xml-reporting==3.2.0 urllib3==2.1.0 websocket-client==1.6.4 Werkzeug==3.0.1 zipp==3.17.0 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43840 from dongjoon-hyun/SPARK-45953. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

Change-Id: I00839ee7bd966b6ebbd3219efb33f2464b2f56dd Co-authored-by: Peter Toth <[email protected]>

…pressions in `buildAggExprList` ### What changes were proposed in this pull request? Trim aliases before matching Sort/Having/Filter expressions with semantically equal expression from the Aggregate below in `buildAggExprList` ### Why are the changes needed? For a query like: ``` SELECT course, year, GROUPING(course) FROM courseSales GROUP BY CUBE(course, year) ORDER BY GROUPING(course) ``` Plan after `ResolveReferences` and before `ResolveAggregateFunctions` looks like: ``` !Sort [cast((shiftright(tempresolvedcolumn(spark_grouping_id#18L, spark_grouping_id, false), 1) & 1) as tinyint) AS grouping(course)#22 ASC NULLS FIRST], true +- Aggregate [course#19, year#20, spark_grouping_id#18L], [course#19, year#20, cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint) AS grouping(course)#21 AS grouping(course)#15] .... ``` Because aggregate list has `Alias(Alias(cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint))` expression from `SortOrder` won't get matched as semantically equal and it will result in adding an unnecessary `Project`. By stripping inner aliases from aggregate list (that are going to get removed anyways in `CleanupAliases`) we can match `SortOrder` expression and resolve it as `grouping(course)#15` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #51339 from mihailotim-db/mihailotim-db/fix_inner_aliases_semi_structured. Authored-by: Mihailo Timotic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…pressions in `buildAggExprList` ### What changes were proposed in this pull request? Trim aliases before matching Sort/Having/Filter expressions with semantically equal expression from the Aggregate below in `buildAggExprList` ### Why are the changes needed? For a query like: ``` SELECT course, year, GROUPING(course) FROM courseSales GROUP BY CUBE(course, year) ORDER BY GROUPING(course) ``` Plan after `ResolveReferences` and before `ResolveAggregateFunctions` looks like: ``` !Sort [cast((shiftright(tempresolvedcolumn(spark_grouping_id#18L, spark_grouping_id, false), 1) & 1) as tinyint) AS grouping(course)apache#22 ASC NULLS FIRST], true +- Aggregate [course#19, year#20, spark_grouping_id#18L], [course#19, year#20, cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint) AS grouping(course)apache#21 AS grouping(course)apache#15] .... ``` Because aggregate list has `Alias(Alias(cast((shiftright(spark_grouping_id#18L, 1) & 1) as tinyint))` expression from `SortOrder` won't get matched as semantically equal and it will result in adding an unnecessary `Project`. By stripping inner aliases from aggregate list (that are going to get removed anyways in `CleanupAliases`) we can match `SortOrder` expression and resolve it as `grouping(course)apache#15` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#51339 from mihailotim-db/mihailotim-db/fix_inner_aliases_semi_structured. Authored-by: Mihailo Timotic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

Make it distinct

2added7

holdenk added 4 commits March 7, 2014 23:59

Use a hash set for unique in graphx instead

3219bff

Fix typo

4eb59af

Update the OpenHashSet comment to make sense (there is no constant EX…

e055a31

…ISTANCE_MASK), provide default params for rehashIfNeeded, and switch the graphimpl to use it instead of the java hash map which is probably what Reynold ment in the CR feedback

Fix the tests I added and the comment

238a4d8

traversableonce you drive me crazy one again

42d9522

lins05 pushed a commit to lins05/spark that referenced this pull request Jan 17, 2017

Add kubernetes profile to travis CI yml file (apache#21)

4c24d9b

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

wesm pushed a commit to wesm/spark that referenced this pull request Jan 30, 2017

renamed to ArrowConverters

102cf3f

defined ArrowPayload and encapsulated Arrow classes in ArrowConverters addressed some minor comments in code review closes apache#21

lins05 pushed a commit to lins05/spark that referenced this pull request Apr 23, 2017

Add kubernetes profile to travis CI yml file (apache#21)

8739b41

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

icexelloss pushed a commit to icexelloss/spark that referenced this pull request Apr 28, 2017

renamed to ArrowConverters

fbe3b7c

defined ArrowPayload and encapsulated Arrow classes in ArrowConverters addressed some minor comments in code review closes apache#21

jlopezmalla added a commit to jlopezmalla/spark that referenced this pull request Jun 28, 2017

Merge pull request apache#21 from mpenate/feature/spark-flow-bumped

686ab26

Feature/spark flow bumped

erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017

Add kubernetes profile to travis CI yml file (apache#21)

979fa92

* Add kubernetes profile to travis yml file * Fix long lines in CompressionUtils.scala

skonto pushed a commit to skonto/spark that referenced this pull request May 9, 2018

wip, uses FBS and env for network auth (apache#21)

d79b7f8

[SPARK-601] Use FBS and env for network auth

xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Jan 22, 2020

Merge pull request apache#21 from xuanyuanking/pr-26201

44b0a37

quick fix

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-3150] Integration with admin and metadata cache notification (#…

7a2a964

…21) * [CARMEL-3150] Integration with admin and metadata cache notification * fix compile error * only send metasote event to one queue

peter-toth added a commit to peter-toth/spark that referenced this pull request Nov 26, 2024

CDPD-70501. Remove upstream github workflow files (apache#21)

31e40fe

Change-Id: I00839ee7bd966b6ebbd3219efb33f2464b2f56dd Co-authored-by: Peter Toth <[email protected]>

SPARK-1311 Map side distinct in collect vertex ids from edges graphx #21

SPARK-1311 Map side distinct in collect vertex ids from edges graphx #21

Uh oh!

Conversation

holdenk commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

AmplabJenkins commented Feb 27, 2014

Uh oh!

pwendell commented Mar 6, 2014

Uh oh!

ankurdave commented Mar 6, 2014

Uh oh!

rxin commented Mar 6, 2014

Uh oh!

rxin commented Mar 6, 2014

Uh oh!

holdenk commented Mar 6, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

holdenk commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

AmplabJenkins commented Mar 8, 2014

Uh oh!

Uh oh!